Search | WHO COVID-19 Research Database

PathoLive-Real-Time Pathogen Identification from Metagenomic Illumina Datasets.

Tausch, Simon H; Loka, Tobias P; Schulze, Jakob M; Andrusch, Andreas; Klenner, Jeanette; Dabrowski, Piotr Wojciech; Lindner, Martin S; Nitsche, Andreas; Renard, Bernhard Y.

Life (Basel) ; 12(9)2022 Aug 30.

Article in English | MEDLINE | ID: covidwho-2006125

ABSTRACT

Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.

CovRadar: continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance.

Wittig, Alice; Miranda, Fábio; Hölzer, Martin; Altenburg, Tom; Bartoszewicz, Jakub M; Beyvers, Sebastian; Dieckmann, Marius A; Genske, Ulrich; Giese, Sven H; Nowicka, Melania; Richard, Hugues; Schiebenhoefer, Henning; Schmachtenberg, Anna-Juliane; Sieben, Paul; Tang, Ming; Tembrockhaus, Julius; Renard, Bernhard Y; Fuchs, Stephan.

Bioinformatics ; 38(17): 4223-4225, 2022 Sep 02.

Article in English | MEDLINE | ID: covidwho-1922197

ABSTRACT

SUMMARY: The ongoing pandemic caused by SARS-CoV-2 emphasizes the importance of genomic surveillance to understand the evolution of the virus, to monitor the viral population, and plan epidemiological responses. Detailed analysis, easy visualization and intuitive filtering of the latest viral sequences are powerful for this purpose. We present CovRadar, a tool for genomic surveillance of the SARS-CoV-2 Spike protein. CovRadar consists of an analytical pipeline and a web application that enable the analysis and visualization of hundreds of thousand sequences. First, CovRadar extracts the regions of interest using local alignment, then builds a multiple sequence alignment, infers variants and consensus and finally presents the results in an interactive app, making accessing and reporting simple, flexible and fast. AVAILABILITY AND IMPLEMENTATION: CovRadar is freely accessible at https://covradar.net, its open-source code is available at https://gitlab.com/dacs-hpi/covradar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Genomics , Mutation

Deep learning-based real-time detection of novel pathogens during sequencing.

Bartoszewicz, Jakub M; Genske, Ulrich; Renard, Bernhard Y.

Brief Bioinform ; 22(6)2021 11 05.

Article in English | MEDLINE | ID: covidwho-1322612

ABSTRACT

Novel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state of the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens but require relatively long input sequences and processed data from a finished sequencing run. Incomplete sequences contain less information, leading to a trade-off between sequencing time and detection accuracy. Using a workflow for real-time pathogenic potential prediction, we investigate which subsequences already allow accurate inference. We train deep neural networks to classify Illumina and Nanopore reads and integrate the models with HiLive2, a real-time Illumina mapper. This approach outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we observe an 80-fold sensitivity increase compared to real-time mapping. The first 250 bp of Nanopore reads, corresponding to 0.5 s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. The approach could also be used for screening synthetic sequences against biosecurity threats.

Subject(s)

COVID-19/genetics , SARS-CoV-2/isolation & purification , COVID-19/virology , Deep Learning , High-Throughput Nucleotide Sequencing , Humans , Nanopores , Neural Networks, Computer , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Sequence Alignment

Interpretable detection of novel human viruses from genome sequencing data.

Bartoszewicz, Jakub M; Seidel, Anja; Renard, Bernhard Y.

NAR Genom Bioinform ; 3(1): lqab004, 2021 Mar.

Article in English | MEDLINE | ID: covidwho-1069280

ABSTRACT

Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL